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Abstract. 

Finding interesting association rules is an important and active research field in 
data mining. The algorithms of the Apriori family are based on two rule extraction 
measures, support and confidence. Although these two measures have the virtue of 
being algorithmically fast, they generate a prohibitive number of rules most of which 
are redundant and irrelevant. It is therefore necessary to use further measures which 
filter uninteresting rules. Many synthesis studies were then realized on the interest- 
ingness measures according to several points of view. Different reported studies have 
been carried out to identify "good" properties of rule extraction measures and these 
properties have been assessed on 61 measures. The purpose of this paper is twofold. 
First to extend the number of the measures and properties to be studied, in addition 
to the formalization of the properties proposed in the literature. Second, in the light of 
this formal study, to categorize the studied measures. This paper leads then to identify 
categories of measures in order to help the users to efficiently select an appropriate 
measure by choosing one or more measure(s) during the knowledge extraction process. 
The properties evaluation on the 61 measures has enabled us to identify 7 classes of 
measures, classes that we obtained using two different clustering techniques. 
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1. Introduction 

Association rules mining algorithms (Agrawal and Srikant, 1994), based on sup- 
port and confidence measures, tend to generate a large number of rules. These 
two measures are not sufficient to extract only the really interesting rules and this 
statement was highlighted in many studies such as (Sese and Morishita, 2002), 
(Carvalho et al, 2005). An additional step of analyzing extracted rules is there- 
fore essential and different solutions have been proposed. A first solution consists 
of restoring easily and with a synthetic way, the extracted information through 
visual representation techniques (Hofmann and Wilhelm, 2001), (Blanchard et 
al, 2003). A second way is to reduce the number of rules. Some authors (Zaki, 
2000), (Zaman Ashrafi et al, 2004), (Ben Yahia et al, 2009) eliminate redundant 
rules, others evaluate and order the rules due to some interestingness measures 
(Lenca et al, 2008). In this paper, we focus on the latter path: the use of in- 
terestingness measures to eliminate uninteresting rules. Many synthesis studies 
compared the different objective measures reported in the literature according 
to several points of view: underlying properties for a "good" interestingness mea- 
sure (Tan et al, 2002), (Lallich and Teytaud, 2004), (Vaillant, 2006), (Geng and 
Hamilton, 2007), (Feno, 2007), (Heravi and Zai'anc, 2010). These synthetic arti- 
cles highlighted some of the interestingness measures reported in the literature 
with some of the proposed properties. 

The purpose of this paper is twofold: first to extend the number of the mea- 
sures and properties to be studied, in addition to the formalization of the different 
properties proposed in the literature; and second, in the light of this formal study 
which is performed by the evaluation of interestingness measures according to 
"good" properties, to categorize the studied measures and to interpret the de- 
tected classes. We then wish to detect groups of measures with similar properties, 
allowing the user from one hand, to restrict the number of measures to choose 
from, and secondly, to direct his choice based on the properties he wishes that 
measures check. 

Therefore, we want to check classes of measures with similar behavior com- 
pared to all the properties we have identified but in any case to explain the 
properties and measures identified in the literature, explanations can be found 
in review articles (Tan et al, 2002), (Lallich and Teytaud, 2004), (Vaillant, 2006), 
(Geng and Hamilton, 2007), (Feno, 2007). The search for these classes of mea- 
sures was performed using well known techniques as one of the methods of ag- 
glomerative hierarchical clustering using Ward criterion (Ward, 1963) and a ver- 
sion of a non-hierarchical clustering method of k-means (MacQueen, 1967). A 
consensus is then derived from the results obtained with both techniques. Before 
starting the search for classes, it became essential to check that this matrix of 
measures x properties could not be simplified by looking for groups of measures 
with completely similar behavior in relation to all the properties and also, if there 
was no redundant properties. 

The article is thus organized as follows. Section [3] presents and formalizes the 
different properties. Section [4] outlines the matrix of measures x properties on 
which we look for classes and studying if it can not be simplified. Section® resti- 
tutes the results of the classification obtained by the first technique: a method 
of agglomcrative hierarchical clustering using Ward criterion. Section Ogives the 
results generated by the second technique: a version of the non-hierarchical clus- 
tering method of k-means and discusses the consistency of the results obtained 
by both techniques. The section ends with a consensus classification. Finally, 
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Section [9] tries to find a semantic to some of the extracted classes and valid the 
retained classification to those released by (Vaillant, 2006), (Y. Lc Bras, 2011), 
(Huynh et al, 2007), (Lesot and Rifqi, 2010), (Zighed et al, 2011). The article 
ends with a conclusion and perspectives. 



2. Association rules 

As defined in (Agrawal et al., 1993), given I = i n } be a set of k items and 

B = {bi, b n } a basket database representing a collection of n subset of items 
/, an association rule (Agrawal et al., 1993) in the database B is a formula 

X =S> Y 

where X and Y are sets of items from /, i.e. X, Y C I with X n Y = 0. 
X represents the antecedent or premise of this rule and Y the consequent or 
conclusion. 

A natural interestingness measure of association rules is based on the notions 
of support and confidence. The Support (when X and Y occur together in at least 
s% of the n baskets) and Confidence (when from all the baskets containing X, at 
least c% also contain Y) of an association rule X =>■ Y are defined by 

SuppfX =► Y) = Supp(X U Y) and Conf(A =»■ Y) = Su P pi - X U p , 

bupp(X) 

An association rule is considered interesting if its confidence and support 
exceed some user-specified thresholds. 

However, the support-confidence approach reveals some weaknesses. Often, 
this approach as well as algorithms based on it lead to the extraction of an 
exponential number of rules. Therefore, it is impossible to validate it by an 
expert. In addition, the disadvantage of the support is that sometimes many rules 
that are potentially interesting, have a lower support value and therefore can be 
eliminated by the pruning threshold minsupp. To address this problem, many 
other measures of interestingness have been proposed in the literature (Geng 
and Hamilton, 2007), mainly because they are effective for mining potentially 
interesting rules and capture some aspects of user interest. The most important of 
those measures are subject to our analysis and are surveyed in Annexe of section 
[T2l However, the concept of association rule itself as well as various measures 
of interestingness are particular cases of what is investigated in depth in (Hajck 
and Havranck, 1978), a book that develops logico-statistical foundations of the 
GUHA method (Hajek and Holeha, 2010). 



3. Recall and formalization of the properties 

The following section presents the different properties of measures reported in 
the literature. We then recall these properties afterward we formalize them for 
a better understanding. 

This section describes the properties currently used in the literature to char- 
acterize measures. Those properties are then summarized in table [TJ 

We give some details about the terminology given in table [T] 
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Table 1. Properties of measures m 



N° 


Properties 


1 1 


The measure tyi is Asymmetric {P\{ri\) — 1) or sym- 
metric (Pi(m) = 0). 


J- 2 


771 docs not equalize the antinomic rules (P^ff/i) = 1) 
or equalizes them (P2(m) = 0). 




m assesses in the same way the rules X — > Y and 
Y — > X in the logical implication case {P-z{m) = 1) 
or not (P 3 (m) = 0). 


Pa 


m increases according to the number of examples 
(P4(m) = 1) or decreases (Pi(m) = 0). 


Ps 


m increases according to the size of the training set 
(P 5 (m) = 1) or not (P 5 (m) = 0). 


P 6 


m decreases according to the consequent size 
(Peim) = 1) or increases (P§(m) = 0). 


Pi 


m has a fixed value in the independence case 
(P 7 (m) = 1) or not (P 7 (m) = 0). 


Ps m has a fixed value in the logical implication case 
(Pg(ra) = 1) or not (P 8 (m) = 0). 


Ps 


m has a fixed value in the equilibrium case (Pg(m) = 
1) or not (Pg(m) = 0). 


Flo 


Identified values in the attraction case between X and 
Y (Pio(m) = 1) or not (Fi (m) = 0). 


Pn 


Identified values in the repulsion case between X and 
Y (Pn(m) = 1) or not (Pn(m) = 0). 


Pl2 


m is tolerant to the first counter-examples (Pi2(m) = 
2) or not tolerant (Pi2(m) = 0) or indifferent 
(P 12 (m) = 1). 


Fl3 


m invariant in case of expansion of certain quantities 
(Pi 3 (m) = 1) or not (Pi 3 (m) = 0). 


Fl4 


m opposes the rules X — > Y and X — > Y (Pi4(m) = 
1) or not (Pi 4 (m) = 0). 


Fib 


m oppposes the antinomic rules X — y Y and X — > Y 
(Pi 5 (m) = 1) or not (Pi 5 (m) = 0). 


Fie 


m equalizes the rules X — > Y and X — > Y (Pie(m) = 
1) or not (Pis(m) = 0). 


Pl7 


m is based on a probabilistic model (Pit(m) = 1) or 
not (Pi 7 (m) = 0). 


Pl8 


m is statistic (Pis(m) = 1) or descriptive (Pig(m) = 
0). 



Pig m is discriminant (Pig(m) = 1) or not (Pig(m) = 0). 
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— Example: individual who checks both the premise X and the conclusion Y 
of the rule, 

— Independence: case where the realisation of X does not increase the chances 
of occurrence of Y, 

— Logical implication: if the conditional probability P(Y/X) is equal to 1, 

— Equilibrium or indetermination: case where Y is achieved when there is 
much chance that X or not X be realized, 

— Attraction: when the realization of X increases the chances of occurrence of 
Y, 

— Repulsion: when the realization of X decreases the chances of occurrence of 
Y. 

We formalize the different properties encountered in the literature and ex- 
posed in table [TJ The title of the 21 properties listed is, preferably, the desired 



property for a measure m. 








Property 1 : Asymmetric 


measure. 






Pi (to) = if m is symmetric 


i.e if V X -> Y m{X - 


* Y) 


= m(Y -¥ X) 


Pi (to) = 1 if m is not symmetric 


i.e if 3 X -> Y / m(X - 


* Y) 


m(Y -> X) 



Property 2 : Asymmetric measure in the sense of the conclusion 
negation or measure does not equalize the antinomic rules 



Pa (to) 


= if m is cn — symmetric i.e if V X Y 


m(X 


-> Y) 


= m(X 


-> Y) 


Pj(to) 


= 1 if m is not cn — symmetric i.e if 3 X — » Y 


1 m(X 


-» Y) 


yi m{X 


-> Y) 



Property 3 : Measure assessing in the same way X — > Y and Y — > X 
in the logical implication case. 



P 3 (to) 


= if 3 X - 


+ Y 1 


P(Y/X) = 1 


and m(X — > Y) ^ m(Y 


-»■ X) 




Pa (to) 


= 1 if VX - 


■> Y 


P(Y/X) = 1 


=> m(X -¥ Y) = m(Y 


-> x) 





Property 4 : Measure increasing according to the number of exam- 
ples or decreasing with the number of counter-examples the number 
of records satisfying X but not Y.. 
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Pi(m) = if m didn't increase with nxY if 3 Xi —¥ Yl, 3 X2 — ¥ Y 2 / 
n Xl = nx 2 and n Yl = ny 2 and (n Xl ny 1 < tix 2 n Y 2 or n x 1 n Yl > n x 2 n y 2 ) 
and m(Xi -> Vi) > m(X 2 -> Y 2 ), 

P4(m) = I if m is increasing with nxY i-e. if V .Yi — > Yl, V X2 — > Y2 
[n Xl =n X2 and n Yl = ny 2 and {n Xl Y x < n X2 Y 2 or n Xl Y 1 > n x 2 Y 2 )} 

=>■ m(Xi — > Y\)<m(X2 -> Y2) and 
[3 Xi ->• Yi, 3Jf 2 -> Y 2 ] / n Xl = «x 2 anrf n Y t = ™y 2 and {n XlYl < n X2 y 2 
or n x 1 Y 1 > n x 2 Y 2 ) and m(Xi -> Yi)<m(X2 -> Y 2 )] 

With uxy_ — \XC\Y\ the number of records satisfying both X and Y and 
n XY = \XnY\. 

Property 5 : Measure increasing according to the size of the training 

set n 



Pb{ m ) — ( m didn't increase with n) if 3 (f2i, f2 2 ), 
3 Xi — ¥ Yi (Qi), 3 X2 — > Y2 (Q2) / n Xl = «x 2 and "Yi = n Y 2 
and n XlYl = nx 2 y 2 and n\ < 712 and m(Xi — ¥ Yj) > m(X2 — ► Y2) 
Ps(m) = 1 (m increases with n) if V Qi, V Q 2) 
VXi —s- Yl (Hi), V 2 — ► Y2 (O2) {nx-i = n x 2 and ny 1 = ny 2 
and nxiYi = "-x 2 y 2 an d n i < "2) => m(Xi — > Yl) < m(X2 — > Y 2 ) 
and 3 Qi, 3 Q 2 , 3 Xi Y 1 {tlx), 3 X 2 -» Y 2 (Q 2 ) / 
(njf x = njf 2 and n Yl = ny 2 and r^yi = n Jf 2 y 2 ™i < "-2 ) 

and m(Xi -> Y\) < m(X 2 -> Y 2 ) 



Property 6 : Measure decreasing according to the the size of the 
consequent Q or the size of the premise 0. 



Pe(m) = 


if m didn't decrease with n Y i.e. if 






3Xi ^ 


Yi, 3 X2 -> Y 2 / n Xl = n A - 2 and njr 1 y 1 = nx 2 y 2 


and ny-j 


< "y 2 




and m(Xi -> Yl) < m(X 2 -¥ Y 2 ), 






P 6 (m) = 


1 if m is decreasing with ny i.e. if 






VXj -> 


Yi, V X 2 ->■ Y 2 (n Xl = n X2 and n x±Yl = nx 2 y 2 


and ny-j 


< n Y 2 ) 




=> m(Xi — > Yi) > m(X 2 -> Y 2 ) and, 








Yi, 3 X 2 Y 2 /(n Xl = n X2 and n Jfl y 1 = nx 2 Y 2 


and nyj 


< n Y 2 ) 




and m(Xi -> Yi) > m(X 2 -f Y 2 ) 







If we consider the premise size, the property Pe(m) — 1 is also written: 



1 ny = I Y I the number of records satisfying Y. 

2 n x = |X| the number of records satisfying X. 
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P§{m) = 1 if m is decreasing with n x i.e. when 

VXi — > Yi, V X2 -> Y2 (ny l = ny 2 and nx x Y x = n x 2 Y 2 an d n x t < nx 2 ) 
=> m(X 1 -> Y 1 )>m(X 2 -> Y 2 ) 



Property 7 : Fixed value a in the independence case. 



P?(m) = i/VaGR3X->y / P(Y/X) = P(Y) 

and m(X —¥ Y) ^ a 
P 7 (m) = 1 (fixed value) if 3 a g R / V X -> Y P(Y/X) = P(Y) 
=*. m(X -»• y) = a 



Property 8 : Fixed value b in the logical implication case. 



P 8 (m) = i/V6€R3X-> V / P(Y/X) = 1 

and m(X -> y) ^ ft 
P s (m) = 1 [fixed value) if 3 ft g R / V Jf -> y P(Y/X) = 1 
m(X ->■ y) = 6 



Property 9 : Fixed value c in the equilibrium case. 



Pg(m) 


= i/ 


V c g R 3 X 


-> y / P(Y/X) = P(X)/2 




and m(X — > 


y) ^ c 




P 9 (m) 


= 1 (fixed value) if 


3 c g R / V X 


-> y P{Y/X) = P(X)/2 




=> m(X -> 


Y) = c 





Property 10 : Identified values in the attraction case between X 
and y. 



Pio(m) 


= i/ V a g R 3 X 
and m(X — ► y) < a 


-> y / P(Y/X) > P(Y) 




Pio(m) 


= 1 (identified values) if 3 a g R / V X 

m(X -> y) > a 


-y Y P(Y/X) > P(Y) 





Property 11 : Identified values in the repulsion case between X and 

Y. 
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Pn(m) 


= i/ V a e R 3 X 


-y y / P(Y/X) < P(Y) 




and m(X —>Y)>a 




Pll(m) 


= 1 (identified values) if ]n£R/Vl 


-y Y P{Y/X) < P(Y) 




m(X -> V) < a 





Property 12 : Tolerance to the first counter-examples. 



Pi2(m) = if reject so convex, 3 min con f £ [0, 1]/V X\ — > Y\, V X2 — > Y 2 

V A e [0,1] n XlYl > min con j n(X%) and n X2 Y 2 > min conf n(X 2 ) 

=>■ fm,n XY (^"XjYi + (1 - A)nx 2 Y- 2 ) < ^fm,n x y (nx^x) + (1 — ^)fm,n X Y (. n X 2 Y 2 ) 

Pl2{m) = 1 if indif ference then linear i.e. Pn(m)^0 and Pn(m) ^ 2 

-Pl2(w) = 2 if tolerance then concave 3 min con f G [0, 1]/V — > Vi, V X2 — >■ Y2, 

V A € [0,1] nx x Yx > rnin con f n(X\) and nx 2 Y 2 > rnin con f n(X 2 ) 

=> fm,n XY (Anx^! + (1 - A)nx 2 y 2 ) > ^fm,n XY (n Xl Yl ) + (1 - A)/ m ,„ xv - (nx 2 y 2 ) 



The notation f m ,n X Y corresponds to the evolution according to the measure 
?n with n-xY when nx, ray and n remain constant. 

Property 13 : Invariance in case of expansion of certain quantities 

n XF and n X y). 



Pi 3 (m) 


= (variance) if 3 (k\,k 2 ) € iV*2! 3 Xi — y 


n, 3x2^12/ 






= kmx 2 Y 2 and n XlY± = km X2 y 2 and n x±Yl = 
and m(X\ — > Yi) 7^ m(X 2 — ¥ Y 2 )] or 


*2nx 2 y 2 and n x 1 Y 1 = 


k ? n x 2 Y 2 




= k l n X 2 Y 2 and n X 1 Y 1 = k T- n X 2 Y 2 and n X 1 Y 1 = 
and m(X 1 -> Yi) ^ m(X 2 V Y 2 )] 


k 2 nx 2 Y 2 and ri XlYl = 


k 2 n x 2 Y 2 


P13 (m) 


= 1 (invariance) if V (k\,k 2 ) £ JV*2j V -Xi 


-> Yi, V X 2 Y 2 / 




[{nx 1 Y 1 


= k\nx 2 Y 2 and n XlYl = k\n X2Y2 and n XiYl = 
=>■ m(Xi -s- Yi) = m(X 2 -y Y 2 )\ and 


-- k 2 n X2Y2 and n XlYl = 


~- k 2 n X2Y2 ) 




= km X2Y2 and n XlYl = k\n X2Y2 and n XlYl = 
=> m(X 1 y 1 ) = m(X 2 y 2 )] 


- k 2 n X2 y 2 and n XlYl - 


~- k 2 n X2Y2 ) 



It is important to note that the formalization of this property by (Tan et 
al, 2002) with the help of the matrix is more compact than what we present, but 
in this article we are looking for the same formalization for all the properties. 

Property 14 : Desired Relationship between the rules X — > Y and 

X Y. 
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P 14 (m) = if 3 X ^ Y / m(X ->• Y) -m(X -> Y) 
P 14 (m) = 1 if V X ->• Y m(X Y) = -m(X Y) 

Property 15: Desired Relationship between the antinomic rules 

X -> Y and X -> Y. 

P 15 (m) = if 3 X ^ Y / m(X -+ Y) -m(X ->■ y) 
P 15 (m) = 1 i/ VX -> 7 m(X -> y) = -m(X -> K) 



Property 16: Desired relationship between the rules X — > Y and 

x -> F. 



PisM = o 


i/ 3X 


-> Y / m(X -> V) ^ m(X 


-> n 




PieM = 1 


if V X 


-> y m(X -> y) = m(X 


-»• n 





Property 17: Premise size is fixed or random. 



Pl7 ( m ) = (fixed size) if m isn't established on a probabilistic model 
Pn(m) = 1 (random size) if m is established on a probabilistic model 



Property 18: Descriptive or statistical measure. 

Pi8( m ) = (descriptive or invariant) if V k £ ./V*, V Xl — > Y±, V X 2 — > y2, 
("XiYi = knx 2 Y 2 and n Xl yi = kn X 2 Y 2 and n x 1 Y 1 = kn x 2 Y 2 and n X 1 Y 1 = kn x 2 Y 2 ) 

=> m(Xi ->• y) = m(x 2 ->• y 2 ) 

P lg (m) = 1 (statistical) if 3 k G N* , 3 Xi — > Yi, 3 X 2 — > Y2/ (n Xl Y x = kn x 2 Y 2 

and n XlYl = kn X 2 Y 2 and n x l Y l = kn x 2 Y 2 and n x 1 Y 1 = kn x 2 Yo ) 
and m(X 1 -> Yl) ^ m(X 2 ->• Y 2 ) 



Property 19: Discriminant measure. 



Pig(m) = (non discriminant) if 3 n G JV*/ V n > 77 V Xi — > Yi, V X2 — > Y2 

[p(y 1 /x 1 )> p(Yi) and p(y 2 /x 2 ) > p(y 2 )] => m(Xt -» Yi) ~ m(X 2 -> y 2 

P lg ( m ) = 1 (statistical) if V 77 G X* 3 n > 7] 3 Xi -» y 3 X 2 -> y 2 / 

[P(Yi/Xi) > P(Yi) and P(y 2 /X 2 ) > P(y 2 )] and m(X 1 -> n) ^ m(X 2 -> y 2 ) 
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After formalizing the properties, we will study them on the different obectives 
measures. 

4. Evaluation of properties on measures 

This section looks for different objective interestingness measures, the presence 
or absence of the properties identified in Section ?? and formalized in Section 
??. This work will lead to the construction of a measure-property matrix. 

We examined 69 measures of which 46 arc from synthesis work (Piatctsky- 
Shapiro, 1991), (Tan et al, 2002), (Lallich and Teytaud, 2004), (Geng and Hamil- 
ton, 2007), (Vaillant, 2006) and (Feno, 2007). Nine measures described in (Huynh 
et al, 2005) have also been studied. These measures include: {Causal confidence, 
Causal confirmed confidence, Descriptive confirmed confidence, Causal confirma- 
tion, Descriptive confirmation, Dependency, Putative causal dependency, Pavil- 
ion and Causal support}. 

Finally, the remaining measures are the following: Czekanowski-Dice 
(Czekanowski, 1913), Fukuda (Fukuda et al, 1996), Ganascia (Ganascia, 1987), 
probabilistic index of deviation from equilibrium (Blanchard et al, 2005), proba- 
bilistic index of deviation from the entropic equilibrium (Blanchard et al, 2005), 
entropic intensity of implication (Gras et al, 2001), likelihood link index (Lcrman 
and Aze, 2007), Kappa (Cohen, 1960), Kulczynski (Kulczynski, 1928), MGK 
(Guillaume, 2000), Ochiai (Ochiai, 1957), satisfaction (Lavrac et al, 1999) and 
VT100 (Rakotomalala and Morineau, 2008). 

Through the study of these different measures, we detect the presence of 
measures having the same definition but different names and are as follows: 

— [tp — coef ficient or Correlation coefficient}; 
—{Cohen or Kappa}; 

—{Centred confidence or Added value or Pavilion}; 

—{Descriptive-confirmed confidence or Ganascia}; 

—{Cosine or Ochiai}; 

—{Czekanowski-Dice or F-measure}; 

—{Bayes factor or Odd-multiplier}; 

—{Factor of certainty or Satisfaction or Loevinger}; 

—{Kulczynski or Agreement and disagreement index}; 

—{Support or Russel and Rao index}; 

—{Accuracy or Causal support}. 

Therefore, if we keep only one measure from the ones listed above, we will be 
in the presence of 61 measures. Table [2] summarizes and groups them into two 
categories: symmetrical and asymmetrical measures. The definition of each index 
is available in Appendix 1 in table [5] The 61 measures of the table arc ordered 
alphabetically, the number of measures given in the table facilitates the search for 
its definition. After presenting data on which we will achieve a classification, we 
now ensure that they can not be constrained by searching for groups of measures 
with identical behavior and if properties are not redundant. 

Initially, we searched all measures whose values for each of the 19 properties 
are identical. We found the following seven groups: G± = correlation coefficient, 
novelty , Gi = Causal confidence, Causal-confirm confidence, Negative reliability, 



Categorization of intcrcstingncss measures for knowledge extraction 



11 



G3 = Cosine, Czekanow ski- Dice , G4 = Causal dependency, Leverage, Specificity . 
G5 = Collective strength, Odds ratio, Gq = Gini, Mutual information and G7 = 
Jaccard, Kulczynski. 

Following the detection of these seven groups of measures, we are now in the 
presence of a matrix of 52 measures since we retain only one measure from each 
one. 

By looking if properties are not redundant, we investigated whether a prop- 
erty had identical values with another property for each of the 52 measures. We 
haven't found such relationship. 



5. Categorization 

Actually, we are in the presence of a matrix of 52 measures and 19 properties, 
properties that are nominal qualitative variables. Nevertheless, it's not easy for 
data mining experts to choose the appropriate interestingness measure from a 
set of 52 measures. Therefore, it is frequently necessary to identify groups of 
measures with similar properties to help the user capture the most suitable 
ones. The most commonly used technique for finding such relationships is cluster 
analysis (Fayyad ct al., 1996), (Hartigan, 1975). 

Clustering techniques are generally used in an unsupervised fashion. They are 
used to place data elements into several groups such that elements in the same 
group are close to each others and elements across groups are far from each 
others (Duda and Hart, 1973). However, there exist many efficient clustering al- 
gorithms in the data mining literature among which the well-known and used are 
k-means clustering and Agglomcrativc Hierarchical Clustering (AHC). Choosing 
one of those techniques is not an easy task, if each of them has advantages and 
limitations. 



5.1. K-means technique 

K-means clustering (MacQueen, 1967) is a commonly used method (Bradley et 
al., 1998), (Fanstrom ct al., 2000), (Roweis and Ghahramani, 1999) of cluster 
analysis which aims to automatically partition observations into k groups of 
greatest possible distinction, where k is provided as an input parameter. It is 
an iterative aggregation method which, wherever it starts from, converges on 
a solution. K-means has several advantages. It is simple and fast: with a large 
number of variables, it may be computationally faster than hierarchical clustering 
(when k is small). In addition, any element may be assigned to a group during 
one iteration then change from group in the following iteration, which is not 
possible with AHC for which assignment is irreversible. 

Despite these advantages, the fixed number of clusters that k-means cluster- 
ing technique require to specify as an input, can make it difficult to predict the 
appropriate number of clusters k. Then, an inappropriate choice of k may yield 
to poor results. Another disadvantage to using this technique is the possibility 
of multiplying the starting locations of cluster centers, which yield to several 
solutions and multiple clusterings. The solution obtained is not necessarily the 
same for all starting points. 
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5.2. AHC technique 

In data mining, hierarchical clustering (Ward, 1963) is a one of the most fre- 
quently method of cluster analysis which seeks to build a hierarchy of clusters. 
Agglomerative hierarchical clustering (Guha et al., 1998), (Guha et al., 2000), 
(Karypis et al., 1999), (King, 1967), (Sneath and Sokal, 1973) is a "bottom- up" 
clustering method where each observation starts in its own cluster, and pairs of 
clusters are merged as one moves up the hierarchy. Hierarchical clustering so- 
lutions, which arc in the form of trees called dendrograms, are of great interest 
for a number of application domains. Despite its proven utility, hierarchical clus- 
tering has many flaws: e.g., interpretation of the hierarchy is complex and often 
confusing; the use of different distance metrics for measuring distances between 
clusters may generate different results. Nevertheless, it is also essential to recog- 
nize the advantages of AHC, if it can produce an ordering of the elements, which 
may be informative for data display. Smaller clusters are generated, which may 
be helpful for discovery. 

The importance revealed by the agglomerative hierarchical clustering and It- 
means clustering techniques, encourage us to apply both of them on our measure- 
property matrix in order to come out with a consensus. 

To launch two versions of clustering algorithms, versions require binary vari- 
ables, we perform a complete disjunctive encoding, which leads us to obtain 
39 binary variables. So we have finally a matrix of 52 measures x 39 binary 
variables. 

After discussing the data and converted them to be able to apply the selected 
algorithms, we study the first clustering of measures obtained with a method of 
hierarchical cluster analysis. 



6. Classification obtained by AHC method 

We made an agglomerative hierarchical classification with Matlab software on 
these 52 measures using Euclidean distance between pairs of measures then Ward 
distance for the aggregation phase. Figure [T]restitucs this classification for Ward 
distance. As the loss of interclass inertia must be as small as possible, we cut 
the dendrogram at a level where branch height is high, corresponding to the 
dendogram colored branches. 

We might also choose the Manhattan distance and we would obtain similar 
results because the matrix is essentially binary: 18 of 19 binary variables, and 
in this case, Manhattan distance is the squared Euclidean distance. Only one 
variable has three values: property Pi 2. 

This classification reveals the 8 following groups of measures: 

— Gc\ = {Likelihood index link, Intensity of implication (II)} 

— Gc 2 = {REII, EII, PDI, IP3E, IPEE] 

— Gcj, = { Two-way variation Support, Pearl} 

— Gc± = {Implication index, Fukuda, Gini, J -measure, Dependency, Weighted 
dependency, Prevalence, Coverage} 

— GC5 = {VT100, Accuracy, Jaccard, Support, Cosine, Recall, Causal depen- 
dency, Causal confirm, Causal confidence} 
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Fig. 1. Agglomerative hierarchical clustering using Ward criterion 



Gcq — {Sebag, Least contradiction, Descriptive confirmation, Examples rate, 
Ganascia, Laplace, Confidence} 

Go? = {Zhang, MGK, Yule's Y, Yule's Q, Goodman, Piatetsky- Shapiro, Cor- 
relation coefficient} 

Gcs = {Interest, Informational gain, Collective strength, Cohen, Relative risk, 
Bayesian factor, Conviction, Factor of certainty, Pavilion, Klosgen, Two-way 
support, One-way support } 

After making this initial measures classification, we will compare it with the 
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classification revealed by the second technique of the fc-means method afterwards 
wc discuss the different results obtained in order to reach a consensus. 



7. Classification obtained by a version of fc-means 

We performed a partitioning method with fc-means using Matlab software by 
retaining equally the Euclidean distance. We chose eight classes according to the 
results of the ARC and we obtained the following partitioning. While present- 
ing these eight new classes obtained, we discuss the consistency of the results 
obtained with the first technique. 

— Gpi = {Likelihood index link, Intensity of implication (II), REII} 

This group is very close to the group Gc\ since we have Gp\ = Gc\ U {REII}. 

— Gp 2 = {EII, PDI, IPSE, IP EE} 

This group is very close to the group Gc2 since we have Gp2 — Gc2 — {REII}. 
We have the following equality: Gp\ U Gp2 = Gc\ U Gca, which shows some 
consistency in the obtained results since we are in the presence of all indices 
of the likelihood link family. 

— Gp3 = { Two-way variation Support, Pearl, Implication index, Gini, J-measure, 
Dependency, Prevalence, Coverage} 

This group is close to the group GC4 since we have: 

Gp3 = GC3 U Gci U {Fukuda, Weighted dependency}. It should be noted 
that Gc3 group, which is composed by Two-way variation Support and Pearl 
measures, is the closest group to Gc\ (see dendogram in figure^. 

— Gpi = {Accuracy, Jaccard, Support, Cosine, Recall, Causal dependency, Causal 
confirm, Causal confidence} 

This group is similar to Gcs group since we have: 

Gp± = Gcz U {Fukuda, Weighted dependency} — {VT\QQ}. 

— Gp§ — {Sebag, Least contradiction, Descriptive confirmation, Fukuda} 
This group is identical to Gcq group. 

— Gp 6 = {Zhang, MCK, Yule's Y, Yule's Q} 

This group is similar to Gcr group since we have: Gc? = Gp^ U {Piatetsky — 
Shapiro, Correlation coef ficient} 

— Gp-i = {Interest, Informational gain, Relative risk, Bayes factor, Conviction, 
Certainty factor, Pavilion, Klosgen, Two-way support, One-way support} 
The group Gpi is very close to Gcg group since we have 10 of 12 measures in 
common. We have the following equality: Gc$ — Gpr U Collective strength, 
Cohen. 

— Gps = {VT100, Piatetsky- Shapiro, Correlation coefficient, Collective strength, 
Cohen} 

Unlike other groups Gpi(i = {1, .., 7}), this group is not similar to any of the 
Gcj(j — {1,..,8}) groups, since these five measures are from Gcs, Gc*i and 
Gcs groups. 

A consensus on the classification is presented in the following. 
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Fig. 2. Clusters of measures. 



8. Final classification 

After the discussion about the consistency of the results obtained by both tech- 
niques, we derive a consensus on the classification. Figure [5] shows the consensus 
and restores the classes C\ to CV of the common extracted measures to both 
techniques. We also include measures for which no consensus has been found 
and give, where it is possible, the two measures membership groups (or classes). 
We have labeled the arrows by " c" and "p" to indicate which technique gathered 
the measures in the pointed group (c = hierarchical clustering or p — partition- 
ing or non hierarchical clustering). Finally, in the lower center of the figure, we 
recall the same measures but with different names. 

Having summarized the results obtained {Figure^, we try in the next section 
to give a semantic to certain extracted classes and validate this classification with 
those released by (Vaillant, 2006), (Lesot and Rifqi, 2010), (Zighcd et al, 2011). 



9. Clusters review and validation 



It is not easy to give a semantic to each of the extracted classes by looking only 
the definitions of these measures. Two classes are yet easy to interpret, which are 
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G\ and C2 classes where we find all the indices of the likelihood link index family 
(Lerman, 1970), the founder index. C\ class has original indices: the likelihood 
link index and intensity of implication (II) (Gras, 1979). We know that these 
two measures are very close since the likelihood link index searches if examples 
number (those who hold both the premise and conclusion) is significantly higher 
while Intensity of implication assesses whether the counter-examples number 
(those that satisfy the premise but does not verify the conclusion) is significantly 
low. 

For C2 class, we find the Entropic implication intensity (EII (Gras et al, 
2001) and IP3E (Blanchard et al, 2005)) measures with the probabilistic index of 
deviation from equilibrium (IPEE (Blanchard et al, 2005)) and the probabilistic 
discriminant index PDI (Lerman and Az, 2007). These measures are derived 
from a common idea: to assess the significance of a number (number of examples 
or counter-examples), combining for some measures (REII (Lallich et al, 2005), 
EII, IP3E) with an entropic index so that the measure is discriminant in the 
case of large data. As for PDI, this index normalizes Intensity of implication in 
order that the latter be discriminant in the case of large data by evaluating a 
rule with respect to the set of valid rules. 

To try to explain each of these classes Ci(i = 1, .., 7), in table® we summarize 
all the properties satisfied by each of the seven classes. We add a symbol to the 
original matrix, the "?" character, which has the meaning "unknown" that is to 
say that measures of class C; take different values for the concerned property 
Pj{j = 1, .., 19). In case where the property is a contradicted once, we show the 
majority property value. Then "0?" means that all the measures of the Ci class 
except one measure, take the value "0" for the property Pj. 

By summarizing all the properties satisfied by each of the seven classes in this 
table, we help the user in the selection of his measure(s) since he/she has only 
to read a much smaller matrix than the original. Moreover, if he/she wishes very 
different measures, his/her choice is also facilitated with the consultation of this 
table, help has been complemented by the dendrogram shown in the figure where 
a notion of proximity between measures appears. Finally, this classification can 
also help to choose measures that are too similar to avoid taking clues from the 
same class. 

About finding semantic for each class, this synthetic table can be a support to 
an interpretation as we will illustrate it for C4 and Cq classes. We will therefore 
focus on these classes and try to give an interpretation. We start with the class 
C 6 . 

9.1. Cq class study 

Cq class is composed of five measures: Zhang (Zhang, 2000), MGK (Guillaume, 
2000), Y and Q of Yule (Yule, 1900) and Goodman (Tan et al, 2002). We know 
from the table that they satisfy the following properties: 

— Non symmetry in the sense of conclusion negation (P2 = 1), 

— Identical evaluation in the logical implication case (P3 = 1), 

— Growth according to the number of examples (P4 = 1), 

— Growth according to the data size (P5 = 1), 

— Fixed value in the independence case (P7 = 1), 

— Fixed value in the logical implication case (Ps = 1), 
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Tabic 3. Characteristics of the seven detected classes 

— Identifiable values when the realization of the premise increases the chances 
of occurrence of the conclusion (P 10 = 1), 

— Identifiable values when the realization of the premise reduces the chances of 
occurrence of the conclusion (Pn = 1), 

— Opposed values for the antinomic rules X —> Y and X —> Y (P15 = 1), 

— Discriminant in the case of large data (P19 = 1). 

Due to the set of satisfied properties, we can give a first semantic for Cg 
class. These measures are a standardized indices since they have a fixed values 
for the independence (P7 = 1) and logical implication (Ps = 1) case and the 
values taken by these indices to determine whether the rule is in the attractive 
(P10 = 1) or in the repulsive area (Pn = 1). 

Figure [3] enables to verify the first semantic given to these indexes. We traced 
the evolution of the five measures when the number of examples increases starting 
then from the incompatibility state (no individual checks both the premise and 
the conclusion or also uxy = 0, with uxy the number of individuals verifying 
both the premise X and conclusion Y) to the logical implication (The set of 
individuals verifying the premise is included in the set of individuals satisfying 
the conclusion or also uxy = n x with nx the number of individuals satisfying 



Categorization of intcrcstingncss measures for knowledge extraction 



19 





Goodman Evolution 











. 


mplicat 


on y 


























































: independence 
















epulsive 








/ 


tractiv 
zone 






































ncompatibility : 

































ZO 40 60 BO 100 1 ZO 140 150 

X : commun individuals to X and Y 



Fig. 3. Evolution of the five measures of Cq class according to the number of 
examples. 



the premise X). As well, we have shown in figure [3]the three characteristic states 
of a rule: the incompatibility, independence and logical implication in addition 
to the attraction and repulsion areas. The whole premise size used to carry out 
these curves is 174, the overall conclusion size is 400 and finally the dataset size 
is 600 (nx = 174, ny = 400 and n = 600). We could have chosen different sizes 
for these different sets and would have obtained similar curves observed with the 
following constraint: nx < ny < n. 
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Table 4. Evaluation of properties on the measures of Class 6. 

Figure [3] allows us to refine the semantic given to this class C§ . These are 
standardized measures with values between —1 and 1 with fixed values equal to 
— 1,0 and 1 respectively for the incompatibility, independence and logical impli- 
cation. Moreover, they don't have only identifiable values in the attraction and 
repulsion area, but these values are between and 1 in the attraction area and 
between —1 and in the repulsion area. Finally, the measure sign provides infor- 
mation about the area belonging to the rule. We can deduce that these measures 
assess a certain distance according to the independence: distance between the 
independence and the logical implication in case of positive values and a distance 
between independence and incompatibility in case of negative values. 

When we look at the figure showing the hierarchical clustering technique, 
we have a greater proximity between the indices Y, Q Yule and Goodman, and 
also higher proximity between Zhang and MGK. Discrepancies highlighted in the 
table, that is to say where we find the symbol "?" for the studied properties, we 
can learn about these two proximities more pronounced between the measures. 
Tabled details the various properties satisfied by the five measures in this group 
and remember the general characteristics of this class. The first property where 
this symbol appears and which enables to explain these two proximities is the 
symmetry of measures (Pi). Y, Q Yule and Goodman are symmetric measures 
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(similar assessment of the symmetrical rules X ~ > Y and Y — > X: Pi = 0) 
while Zhang and MGK are not symmetric measures (different evaluation of the 
symmetric rules X — > Y and Y — > X: Pi = 1). 

Properties P14 (opposed values or not for the rules X — >• Y and X Y) 
and Pie (identical values for the rules X — >• Y and X — > Y or not) also help to 
explain these two proximities. Indices Y, Q and Goodman have opposite values 
for the_rules X — > Y and X — > Y and identical values for the rules X — > Y and 
X — > Y. The measures Zhang and MGK verify the negation of the two latter 
properties. 

We will now make a study of class C4. 
9.2. Study of the C 4 class 

Class C4 contains the following indexes: Accuracy (Tan et al, 2002), Jaccard 
(Jaccard, 1908), Support (Russell and Rao, 1940), Cosine (Ochiai, 1957), Re- 
call (Lavrac et al, 1999), Causal dependency (Tan et al, 2002), Causal con- 
fidence (Kodratoff, 2001), Causal- confirm confidence (Kodratoff, 2001), Nega- 
tive reliability (Lavrac et al, 1999), Leverage (Piatetsky-Shapiro, 1991), Speci- 
ficity (Tan et al, 2002), Czekanowski-Dice (Czekanowski, 1913) and Kulczynski 
(Kulczynski, 1928). 

From table\3\ these 14 measures satisfy the 12 following properties: 

— Non symmetry in the sense of conclusion negation (P 2 = 1), 

— Discriminant in the case of large data (Pig = 1), 

— Non Fixed value in the independence case (P7 = 0) and equilibrium (Pg = 0), 

— Unidentifiable values in the case of attraction (P10 = 0) and repulsion (Pu = 
0), 

— Non-invariant in the case of expansion of certain numbers (P13 = 0), 

— Two relations between the different negative rules are not present (Pu = 0) 
(Pis = 0), 

— Not based on a probabilistic model (Pn = 0), 

— Descriptive measures (Pig = 0). 

Let us study now the properties satisfied by almost all the measures except 
one: 

— Growth according to the number of examples (P4 = 1) with the exception of 
the Support, 

— Growth according to the size of the conclusion (P§ = 1) with the exception of 
the Support, 

— Measures do not equalize the rules X — > Y and X — > Y (Pie = 0) with the 
exception of Accuracy. 

Given the relatively large number of the measures present in this class (the 
class whose cardinality is greater), it is difficult to find a semantic as precise as 
for the previous class Cq. However, we can give one to a smaller set of measures: 
Jaccard, Support, Cosine, Czekanowski-Dice, Kulczynski and Recall. These mea- 
sures are function P(XY) and symmetrical (with the exception of the Recall). 
We recall the expressions of these six measures: 
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Table 5. Evaluation of properties on a subset of measures of Class 4. 



Jaccard : p ^ X )+p\y)-p(xy) ~ p(xy)+p(y) 
Support : P(XY) 



Cosine 



P(XY) 

y/P(X)P(Y) 



Czekanowski — Dice : P (x)+p(Y) 
Kulczynski : 



P(XY)+P{XY) 

We can then deduce that these measures will have a fixed value equal to in 
the case of incompatibility (P(XY) ~ 0). We also understand the non growth 
that is founded according to the dataset size (P5 = 0) at the sight of these 
different formulas as shown in table [5] which reproduces the satisfied properties 
by these six measures. We have an invariance of these measures (except for the 
Support) depending on the size n of the dataset since it amounts to increase the 
probability P(XY). As to the Support, it is decreasing according to the size n 
of the whole data. 

As with the previous class C§, we will study the evolution of these different 
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measures according to the number of examples. Figure [4] restitutes this evolu- 
tion. We retained the same cardinality as above for the sets premise, conclusion 
and the whole data set (nx = 174, ny = 400 and n = 600). 

We test the null value taken by these measures in the case of incompatibility. 
We obtain two types of curves: 

— A straight line for the measures Support, Cosine, Czekanow ski- Dice and Recall, 

— A half-parabole for the measures Jaccard and Kulczynski. 

After studying more precisely some classes and tried to give an interpretation 
to them, now we validate our work by a comparison with existing classifications 
(Vaillant, 2006), (Lesot and Rifqi, 2010), (Zighed et al, 2011), (Hcravi and Zaiane, 
2010). 

10. Validation 

Many studies have been realized to better understand intercstingness measure 
behavior. In this section, we want to compare clusters of measures we obtained 
with previous works (Vaillant, 2006), (Huynh et al, 2007), (Y. Le Bras, 2011), 
(Lesot and Rifqi, 2010), (Zighed et al, 2011) in order to highlight similarities and 
differences between them. 

10.1. Comparison with the work of B. Vaillant 

We first compare the classification we obtained with that of Benoit Vaillant 
(Vaillant, 2006), who made his study on 20 measures according to 9 formal 
properties. From these nine properties, we have 7 properties in common as " com- 
prehens ability of the measure" and " easiness to set a threshold of acceptance' 1 '' 
are considered too subjective. To perform a classification, Benoit Vaillant also 
used Ward criterion but has retained Manhattan distance. The author points 
out that by using other criteria, he obtained similar results. He identified the 
five following classes: 

— CIBV\ — {Support, Least contradiction, Laplace}, 

— CIBV2 — {Confidence, Sebag, Examples rate}, 

— CIBV3 = { Correlation coefficient, Piatetsky- Shapiro, Pavilion, Interest, Im- 
plication index, Cohen, Informational gain}, 

— ClBVi = {Loevinger, Bayes factor, Conviction} and 

— CIBV5 = {Zhang, IIET, Intensity of implication, Probabilistic Discriminant 
index}. 

We can assimilate IIET measure with REII because the purpose of both of 
them is the same. 

We agree on the following grouping: 
CIBV2 C C 5 , ClBVi C C 7 , 

and we have the following relations between groups: ClBVi — Support c C5, 
CIBV3 — Implication index C Gps U CV and CIBV5 — Zhang C Ci U ft- 

The grouping where the disagreement is the most important is CIBV3, since 
we have bring up Gp$ group which is present with only one technique: a version 
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Fig. 4. Evolution of the six measures of C4 class according to the number of 
examples. 



of fc-means. As for CIBV§ group, it includes all intensity of implication family 
measures, except Zhang measure. 

We studied 12 additional properties, which explains why we do not find all 
the results of Benoit Vaillant. 

In the following, we compare our results with those obtained by Y. Le Bras 
(Y. Le Bras, 2011). 
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10.2. Comparison with the work of Y. Le Bras 

In his work, Y. Le Bras (Y. Lc Bras, 2011) seeks to find common characteristics 
of objective measures. For that, he studied 42 interestingness measures according 
to six operational criteria that he proposed. These criteria concern from one hand 
the possibility to calculate robustness, and secondly to use efficient algorithms. 
Criteria are listed below: 

— Robustness measure calculation: This is a rule measure resisting test w.r.t 
database disturbance (Le Bras et al, 2010) 

1. Planar measure: for some measures, distance calculation is reduced to the 
calculation of the distance to a plan, which allows to provide exact algebric 
solution; 

2. Quadratic measure: measures require to use a certain number of mathe- 
matical tools. 

— Algorithmic properties allowing algorithm to be effective: 

3. GUEUC: it is the general property UEUC [Universal Existential Upward 
Closure), which is a down monotonicity property; 

4. Measure omni-monotony; 

5. Measure opti-monotony; 

— Anti-monotony property of a measure for finding optimal rules: 

6. Measure anti-monotony. 

For each of the algorithmic properties, a generalization has been provided 
by the author (GUEUC, omni-monotony and opti-monotony) as he proposed 
existence conditions of these generalizations . 

By looking to the 6 described properties, we find that we are in total dis- 
agreement with Y. Le Bras w.r.t criteria choosen for studying measures behavior. 
Nevertheless, this does not prevent us from confronting our two works for a bet- 
ter understanding of measures behavior. In total, we have 38 common measures, 
some of them have the same definition but with different names 0- By comparing 
our works, we seek to identify whether common measures which belong to the 
same group evaluate properties studied by (Y. Le Bras, 2011) in the same way. 

The comparison of our results (section^ with those obtained by Y. Le Bras 
reveals similarities according to these groups of measures. 

— C3: clusters Coverage, Gini, Implication index, J-measure, Prevalence and 
Pearl (which according to k-means, belongs to this group) measures, com- 
mon to both works. According to (Y. Le Bras, 2011), none of these measures 
is quadratic or anti-monotonic. He also shows the closeness of Coverage and 
Prevalence measures, since they are the only two planar and omni-monotonic 
measures having GUEUC property; 

— C4: contains the following common measures Cosine, Czekanowski-dice, Jac- 
card, Kulczynski, Accuracy, Specificity, Support and Recall. All of them, ex- 
cept Cosine which is quadratic, are planar and verify the antimonotony prop- 
erty. Furthermore, we find that most of these measures have GUEUC property, 



3 Interest represents Pearl in our work, Levier represents Novelty measure and Jl-measure is 
Two-way support measure 
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except Kulczynski and Specificity. Support is the only omni-monotonic mea- 
sure in this cluster; 

— C5 : we remark that Descriptive confirmation is the only measure absent from 
this group. Y. Le Bras's work (Y. Le Bras, 2011) reveals that Examples rate, 
Sebag, Ganascia and Confidence verify all the studied properties in the same 
way. Otherwise, none of C5 commun measures is quadratic, but they all are 
omni and opti-monotonic. We realize also that all these measures are planar, 
except Laplace and that only two of them (Least contradiction and Laplace) 
are anti-monotonic; 

— Cq: we find the following three opti-monotonic measures Y, Q Yule and Zhang, 
which do not check any of these properties, antimonotony, omni-monotony and 
planar measure. Visualizing the behavior of Piatetsky-shapiro and Novelty 
measures, which belong to this group according to the hierarchical method, 
we find that they are also opti-monotone and do not check the omni-monotony 
property and planar measure. Novelty, which seems to be more robust than 
Piatetsky-Shapiro (it is quadratic), is the only measure which has the good 
property of anti-monotonicity in class rules case; 

— C7: all C-j measures have been studied by (Y. Le Bras, 2011), including Col- 
lective strength, Cohen and Odds ratio measures, which according to the hi- 
erarchical method belong to C7. Among all these measures, only Cohen is 
anti-monotonic, but none of them is omni-monotonic or planar. GUEUC prop- 
erty is verified by Pavilion, Conviction, Factor of Bayes, Informational gain, 
Interest and Loevinger, which arc quadratic and opti-monotonic, identifying 
then strong operational properties with Cohen, Odds ratio and Relative risk 
measures. 

Following our works comparison, we notice that from Y. Le Bras study on 
intcrcstingness measures according to the six proposed criteria, we can identify 
behavior similarities between common measures of the same group. The only 
group which doesn't reveal a good agreement is C3. 

Another classification realized by (Huynh et al, 2007) on intcrcstingness mea- 
sures using datasets is presented in the next section and compared with the 
classification obtained in section^ 

10.3. Comparison with the work of Hyunh et al. 

Another classification was made by Huynh et al. (Huynh et al, 2007), who studied 
36 interestingness measures, with 32 commun measures, on 2 datasets with op- 
posite nature: one highly correlated (mushroom) and the other weakly correlated 
synthetic base (T5.I2.D10K). authors present initially a taxonomy of measures 
according to the following 2 criteria: 

1. Topic: deviation from independence or equilibrium; 

2. Nature: descriptive or statistical. 

From the study of these two particular parameters on datasets, the 5 following 
groups of measures are retained: 

— Clde {descriptive / deviation from equilibrium): {Confidence, Laplace, Sebag, 
Examples rate, Descriptive confirmation, Descriptive confirmed- confidence, Least 
contradiction }; 
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— Cldi {descriptive / deviation from independence): {Correlation, Interest, Lo- 
evinger, Conviction, Dependency, Pavilion, J-measure, Gini, TIC, Collective 
strength, Odds ratio, Yule's Q, Yule's Y, Klosgen, Cohen }; 

— Cl se [Statistical / deviation from equilibrium): {IPEE }; 

— Cl s i {Statistical / deviation from independence): {II, EII, EII2, Lerman, In- 
terest Rule }; 

— Cl {other): {Support, Precision, Jaccard, Cosine, Causal confidence, Causal 
confirmation, Causal confirmed- confidence, Causal dependency }. 

By comparing these 5 groups of measures with those described in Figure^ we 
note our agreement on the categorization of the following measures: { Confidence, 
Laplace, Sebag, Examples rate, Least contradiction } C C5, {Correlation, Cohen, 
Collective strength, Odds ratio } C Gps since they are gathered according to the 
partitioning method K-means, {Gini, J-measure, Dependence, Klosgen } C C3, 
{Interest, Loevinger, Conviction, Pavilion, Klosgen } C C7, {Yule's Q, Yule's Y 
} C Ca and finally {Jaccard, Cosine, Causal confirmation, Causal Confidence, 
Causal confirmed- confidence } C C4. According to this comparison, we highlight 
similarities between groups of common measures revealed by both works. 

10.4. Comparison with other works 

Another classification was performed on distance and similarity measures by 
Marie- Jeanne Lesot and Maria Rifgi (Lesot and Rifqi, 2010). Authors studied 
the induced order using measures and not the obtained numerical values, since 
their context of study is the information research. This study focused on mea- 
sures dedicated to binary and digital data by conducting experiments on both 
real and artificial data. The authors obtained a list of equivalent measures (mea- 
sures that induce always the same order) and for non-equivalent measures, they 
quantified the disagreement by a degree of equivalence based on the general- 
ized Kendall's coefficient. On the 10 measures studied and designed for binary 
data, five are common to our two studies. These measures are: Czekanowski- 
Dice, Jaccard, Ochiai, Yule 's Y and Yule 's Q. Authors found that Yule 's Y and 
Yule's Q are equivalent measures. This result is also confirmed by our study 
since these two measures are in the same class C§ as we have already mentioned, 
and are very close according to the dendogram of the figure [1] They also found 
that Czekanowski-Dice and Jaccard arc equivalent measures. Both measures were 
also assigned to the same class: the class C4, and we find them with a relatively 
large proximity in the dendogram of the figure [T] (we chose Cosine measure as 
a representative one on the dendrogram as we have discussed in Section Co- 
sine and Czekanowski-Dice measures have identical values for the 19 properties 
which led to the formation of G3 group). Finally, we grouped also Ochiai (or Co- 
sine) measures with Czekanowski-Dice and Jaccard in C4 cluster. Authors (Lesot 
and Rifqi, 2010) found a degree of equivalence between Ochiai measure and the 
equivalence class Czekanowski-Dice, Jaccard of 0.99, which confirms our results. 

A final classification was proposed by Djamel Zighed, Rafik Abdesselam and 
Ahmed Bounekkar (Zighed ct al, 2011) on 13 proximity measures. Only two of 
them are common to our two studies: Cosine and Correlation coefficient. The 
classification they proposed is based on the topological equivalence and uses 
the structure of local neighborhood. Both measures appeared very close in this 
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classification, in contrast to our work as wc find them in classes C4 and Gpg . The 
set of studied measures are so different, the founded classes by each technique are 
difficult to compare. Moreover, as authors emphasized during the presentation of 
their work, the classification they obtained is performed as poorly representative 
because it is applied on a single dataset: Fisher's Iris. 

We are well aware that measures categorization may also depend on several 
factors including: the data, the expert user, the nature of the extracted rules 
and classes search procedure, as highlighted by (Suzuki, 2008). To avoid bias 
data, the expert and the nature of the extracted rules, we have chosen here a 
theoretical study based on properties of measures, rather than experimental data 
(Huynh et al, 2005). Both are obviously complementary. 

To avoid the bias of the clusters construction procedure, we used two clas- 
sification techniques, which generally exhibited strong similarities between many 
measures, and highlights similarities and differences with previous works ((Vaillant, 
2006), (Lesot and Rifqi, 20f0), (Zighed et al, 2011). This study complements pre- 
vious works on the description of a unifying vision of interestingness measures 
(Hcbert and Cremillcux, 2007), and adds a further contribution to the analysis 
of these measures. 



11. Conclusion 

This article takes as its starting point a synthesis paper on interestingness mea- 
sures present in the literature to extract knowledge and properties judged rele- 
vant to them. This synthesis work led to the assessment of 19 properties judged 
interesting on 61 measures. The objective of this paper is the classification of 
these measures to assist the user in his choice of complementary measures to 
the couple (Support, Confidence) to eliminate uninteresting rules. Initially, we 
analyzed these data (matrix of 61 measures x 19 properties) to determine if 
simplification was not feasible by looking first to groups of measures with com- 
pletely identical behavior and then by detecting if properties were not redundant. 
We detected seven groups of measures with completely identical behavior which 
enabled to reduce our starting data for the classification research by two tech- 
niques: a method of agglomcrativc hierarchical classification and a version of k 
-means method. Classifications obtained from both techniques allowed to reach 
a consensus: 7 classes were partially validated by existing classifications. 

In the future, we would like to consolidate classes of measures we obtained 
by comparing the N best extracted rules in different databases and by each of 
the studied measures to verify that this set of N best rules is substantially the 
same in each class. Finally, it would be interesting to consider smaller classes 
(with the help of the extracted dendogram) to assign a semantic to each of them, 
which would be a great help to the user (rather than a set of verified properties), 
since we saw our inability to define in a few words or phrases each of these 
extracted classes. Complementary properties might to be considered. The notion 
of association rules robustness (Le Bras et al, 2010) could be also considered in 
the interestingness measures categorization. 

Acknowledgements. We thank Isral-Csar Lerman for his constructive comments on 
this article. Moreover, this work is partially supported by the French- Tunisian PHC 
Utique 11G1417: EXQUI (Extraction, QUality and Knowledge Engineering in hetero- 
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Measure 


Formula 


1 


Correlation co- 
efficient 


P {XV)-p(X)piY) 
^p(X)p(Y)p(X)p(Y) 


2 


Cohen or Kappa 


n p( A V } 1 1 ! A )p{Y ) 
P(A-) ■ „IY) 2,,,X,pl,Y) 


3 


Confidence or 
precision 


p(XY) 
p(X) 


4 


Causal Confi- 
dence 




5 


Centered Confi- 
dence or Pavil- 
ion 


p(Y) P^ 1 ' 


6 


Descriptive 
Confirm Con- 
fidence or 
Ganascia 


o p(XY) 
1 z V(X) 


7 


Causal Confirm 
Confidence 




8 


Causal Confirm 


p(X)+p(Y)-4p(XY) 


9 


Descriptive 
Confirm 


p(XY)~p(XY) 


10 


Conviction 


p(X)p(Y) 
p(XY) 


11 


Cosinus or 
Ochiai 


p(XY) 
^p(X)p(Y) 


12 


Coverage 


P(X) 


13 


Czckanowski- 
Dice or r - 
measure 


P(XY) 

p(ir) + i-,(xy) 


14 


Dependency 


\\ P (y)-^x9\\ 


15 


Putative Causal 
Dependency 








16 


Gray and 
Orlowska's 
Intcrcsting- 
ness Weighting 
Dependency 




17 


Baycs factor or 
Odd multiplier 


ptXl')p(Y-) 

P (xy)p(y-) 


18 


Certainty factor 
or Loevinger or 
Satisfaction 


Pix^j-pixipiy) 

P(X)P(Y) 


19 


Negative relia- 
bility 


p(XY) 
P(Y) 


20 


Collective 
Strength 


^>+ P pfx? .. x-pPOpm 


-p(X)p(Y) 




p(X)p(y)+pCX)p(i') " i_j,( X y) 


P(XY) 
p(X) 




21 


Fukuda 




22 


Informational 
gain 


, / p(XY') \ 

l °9i{jUcyp-m ) 


23 


Gini 


fv\ ( P 2 (XY) p 2 (XY)\ , 


'T?-, ( P 2 (XY) 


p 2 (XY)\ 


p 2 (Y)-p 2 (Y) 


> A) pHx) - 


h P 2 (X) ) 
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^^ k P^j- Y k)+Y^ k "-jP^j^l-m^-PIXj) 

!-m»z j P(X j )-ma. t P(r i ) 
l XY n XY n XY n XY 

72 72 



Goodman- 
Kruskal 



- m aa: k P(Y k ) 



l XY n XY 

72 + 
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Implication in- 
dex 



v 7 " 



p(XY)-p(X)p(Y) 
s/p(X)p(Y) 



Probabilistic in- 
tensity of devi- 
ation from equi- 
librium (IPEE) 



jV(0,l) > n xY"XY 
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Entropic proba- 
bilistic intensity 
of deviation 
from equilib- 
rium (IP3E) 



! ( (1 - hi(P(XY))') x (1 - h 2 (P(XYW)j + 1 

- -pfx)^ l °a2^ 



VIPEE with hi(t) = 



for t G 



, else h t (t) = 1 h 2 (t) = 



e/se /12 (t) — 1 



2<s 



Probabilistic 
discriminant 
index (PDI) 



P AT(0, 1) > H CR / B where U CR / B indicate that IJ is 

reduced-centred according to the values taken by II on the ex- 
tracted rules set. 



VSjX Y~T 



2<> 



Mutual Infor- 
mation 



-P{X)log 2 P{X)-P(X)log 2 P{X) 



Intensity of Im- 
plication (II) 



P Poisson(nP(X)P(Y)) > P{XY) 
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Entropic inten- 
sity of implica- 
tion (HE) 



h x (P(XY))*^ x fl - h 2 {P(XY))^j 



x // 



.32 



Entropic inten- 
sity of revised 
implication 
(HER) 



MP(xy))2 x i - h 2 (P(XY)y 



X yjmax{2 X 77 - 1; 0) 
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Likelihood dis- 
criminant index 



P Poisson(nP(X)P(Y)) < P{XY) 



34 



Interest or Lift 



3o 



Jaccard 



p(XY) + p(Y) 



:',(> 



J-Mcasurc 



p(*n ( ■pjxjpjY} +p(*y) ^{Tmih 



37 

38 



Klosgcn 



V^)( E pW-P(Y)) 



P(XY) 
p{XY)+p(XY) 



Kulczynski or 
Agreement and 
disagreement 
index 



np(XY) + T 
np(X) + 2 

p(XV) 



39 
40 
41 



Laplace 



Leverage 



p(X)p(Y) 



If P(Y/X) > P(Y) then M GK (X -s- Y) 
PIY (-^yT ] ^se M GK (X -> Y) = * (Y '$-* m 



Least contradic- 
tion or Surprise 



p(XY)-p(XY) 
p(Y) 
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43 


Novelty 


p(XY) - p{X)p(Y) 


44 


Pearl 


P(X)\%^~P(Y)\ 


45 


Piatctsky- 
Shapiro 


nx (^p(XY) - p(X)p(Y) S j 


46 


Accuracy 


p(XY)+p(XY) 


47 


Prevalence 


p(Y) 


48 


Yule's Q 


p(XY)ptXY\ ,;XY)p(XY) 
p(XY)plXY) | M XV),.i.Yy) 


49 


Recall 


p(XY) 
P(Y) 


50 


Odds Ratio 


ptXY)p(XY) 


51 


R.cl at ivc R^isk 


l[ffl? XY) 


52 


Scbag- 
Schocnauer 




P(XY) 


53 


Specificity 


P(XY) 
P(X) 


54 


Support or Rus- 
sel and Rao in- 
dex 


p(XY) 


55 


Yao and Liu's 
One Way Sup- 
port 


P(XY), p(XY, 

P (x) io y* P (x)p(Y) 


56 


Yao and Liu's 
Two Way Sup- 
port 




57 


Examples 
and counter- 
examples rate 


p(XY)-p(XY) 
P(XY) 


58 


Test value 
VT100 


4>- L (P[Hypergeometric(VMP{X)P(Y)) < P(XY)]) 


59 


Yao and Liu's 
Two Way Sup- 
port Variation 




60 


Yule's Y 


tJpIXY)p { xy )- v'j.ixr iptxY) 

•y P (XY)p[XY ) ■ piXY ip(XY) 


61 


Zhang 


p(XY)-p(X)p(Yi 
ma x^p[XY)p(Y), p(Y)p(XY}^ 



Table 6: Definition of 61 measures. 
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